A METHOD OF INTELLIGENT DATA ANALYSIS 
TO DETECT ABNORMAL USE OF UTILITIES IN BUILDINGS 
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Background of the Invention 

1. Field of the Invention 

[0001] The present invention relates managing consumption of 
utilities, such as electricity, natural gas and water; and more 
particularly to detecting the occurrence of abnormal usage. 

2 . Description of the Related Art 

[0002] Large buildings often incorporate computerized control 
systems which manage the operation of different subsystems, such 
that for heating, ventilation and air conditioning. In addition 
to ensuring that the subsystem performs as desired, the control 
system operates the associated equipment in as efficiently as 
possible . 

[0003] A large entity may have numerous buildings under 
common management, such as on a university campus or a chain of 
store located in different cities. To accomplish this, the 

-1- 



controllers in each building gather data regarding performance 
of the building subsystems which data can be analyzed at the 
central monitoring location. 

[0004] With the cost of energy increasing, building owners 
are looking for ways to conserve utility consumption. In 
addition, the cost of electricity for large consumers may be 
based on the peak use during a billing period. Thus high 
consumption of electricity during a single day can affect the 
rate at which the service is billed during an entire month. 
In addition, certain preferential rate plans require a customer 
to reduce consumption upon the request of the utility company, 
such as on days of large service demand throughout the entire 
utility distribution system. Failure to comply with the request 
usually results in stiff monetary penalties which raises the 
energy cost significantly above that for an unrestricted rate 
plan. Therefore, a consumer has to analyze the energy usage in 
order to determine the best rate plan and implement processes to 
ensure that operation of the facility does not inappropriately 
cause an increase in utility costs. 

[0005] In addition, abnormal energy or other utility 
consumption may indicate malfunctioning equipment or other 
problems in the building. Therefore, monitoring utility usage 
and detecting abnormal consumption levels can indicate when 
maintenance or replacement of the machinery is required. 
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[0006] As a consequence, sensors are being incorporated into 
building management systems to measure utility usage for the 
entire building, as well as specific subsystems such as heating 
air conditioning and ventilation equipment. These management 
systems collect and store massive quantities of utility use data 
which is overwhelming to the facility operator when attempting 
to analyze that data in an effort to detect anomalies. 
[0007] Alarm and warning systems and data visualization 
programs often are provided to assist in deriving meaning 
information from the gathered data. However, human operators 
must select the thresholds for alarms and warnings, which is a 
daunting task. If the thresholds are too tight, then numerous 
of false alarms are issued, and if the thresholds are too loose, 
equipment or system failures can go undetected. The data 
visualization programs can help building operators detect and 
diagnose problems, but a large amount time can be spent 
detecting problems. Also, the expertise of building operators 
varies greatly. New or inexperienced operators may have 
difficulty detecting faults and the performance of an operator 
may vary with the time of day or day of the week. 
[0008] Therefore there is a need for robust data analysis 
methods to automatically determine if the current energy use is 
significantly different than previous energy patterns and if so, 
alert the building operator or mechanics to investigate and 
correct the problem. 



Summary of the Invention 

[0009] Abnormal utility usage by a building or a particular 
apparatus in the building can be determined by repeatedly 
measuring the level of use of the utility thereby producing a 
plurality of utility measurements. A Generalized Extreme 
Studentized Deviate (GESD) statistical procedure is applied to 
the plurality of utility measurements to identify any 
measurement outliers. The measurement outliers denote times 
when unusual utility consumption occurred, thereby indicating 
times during which operation of the building or the particular 
apparatus should be investigated. 

[0010] In the preferred embodiment, a severity of abnormal 
utility usage can be established by determining a degree to 
which the associated outlier deviates from the norm. This can 
be accomplished by calculating robust estimates of the mean 

(^robust) and the standard deviation (s robust ) of each outlier 

Brief Description of the Drawings 

[0011] FIGURE 1 is a block diagram of a distributed facility 

management system which incorporates the present invention; 

[0012] FIGURE 2 is a box plot of average electrical power 

consumption for a building; 

[0013] FIGURE 3 is a graph depicting the energy consumption 
for a building; and 

-4- 



[0014] FIGURE 4 is a flowchart of the algorithm that analyzes 
the energy consumption data for the building. 



Detailed Description of the Invention 

[0015] With reference to Figure 1, a distributed facility 
management system 10 supervises the operation of systems in a 
plurality of buildings 12, 13 and 14. Each building contains 
its own building management system 16 which is a computer that 
governs the operation of various subsystems within the building. 
Each building management system 16 also is connected to numerous 
sensors throughout the building that monitor consumption of 
different utility services at various points of interest. For 
example, the building management system 16 in building 13 is 
connected to a main electric meter 17, the central gas meter 18 
and the main water meter 19. In addition, individual meters for 
electricity, gas, water and other utilities can be attached at 
the supply connection to specific pieces of equipment to measure 
their consumption. For example, water drawn into a cooling 
tower of an air conditioning system may be monitored, as well as 
the electric consumption of the pumps for that unit. 
[0016] Periodically the building management system 16 

gathers data from the sensors and stores that information in a 
database contained within the memory of the computer for the 
building management system. The frequency at which the data is 
gathered is determined by the operator of the building based on 
the type of the data and the associated building function. The 
utility consumption for functions with relatively steady state 
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operation can be sampled less frequently, as compared to 
equipment large variations in utility consumption. 
[0017] The gathered data can analyzed either locally by the 
building management system 16 or forwarded via a communication 
link 20 for analysis by a centralized computer 22. For example, 
the communication link 2 0 can be a wide area computer network 
extending among buildings in an office park or a university 
campus, or the communication link may comprise telephone lines 
extending between individual stores and the principal office of 
a large retailer. 

[0018] The present invention relates to a process by which 
the data acquired from a given building is analyzed to determine 
abnormal usages of a particular utility service. This is 
accomplished by reviewing the data for a given utility service 
to detect outliers, data samples that vary significantly from 
the majority of the data. The data related to that service is 
separated from all the data gathered by the associated building 
management system. That relevant data then is categorized based 
on the time periods during which the data was gathered. Utility 
consumption can vary widely from one day of the week to another. 
For example, a typical office building has relatively high 
utility consumption Monday through Friday when most workers are 
present, and significantly lower consumption on weekends. In 
contrast, a manufacturing facility that operates seven days a 
week may have similar utility consumption every day. However, 
different manufacturing operations may be scheduled on different 
days of the week, thereby varying the level of utility 
consumption on a daily basis. 
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[0019] Therefore, prior to implementing the outlier analysis, 

the building operator defines one or more groups of days having 
similar utility consumption. That grouping can be based on a 
knowledge of the building use, or from data regarding daily 
average or peak utility consumption. For example, Figure 2 is a 
box plot of the average daily electrical power consumption for 
an exemplary building. A similar box plot can be generated for 
the peak electrical power consumption. It is apparent from an 
examination of this graph that consumption during weekdays 

(Monday through Friday) is similar, i.e. the normal consumption 
of electricity falls within one range of levels (A) , and weekend 
periods (Saturday and Sunday) also have similar consumption 
levels that fall within a second range (B) . Therefore, separate 
utility consumption analyses would be performed on data from two 
groups of days, weekdays and weekends. However, different day 
groups would apply to a manufacturing plant in which high 
utility consuming equipment is run only on Tuesdays, Thursdays 
and Saturdays. In this latter example, Tuesdays, Thursdays and 
Saturdays would be placed into one analysis group with the 
remaining days of the week into a second group . 

[0020] Figure 3 depicts the peak daily consumption for this 
building over a period of four weeks. The weekday peaks are 
significantly greater than the peak consumption on the weekends. 
Point 3 0 represents a day when peak consumption of electricity 
was abnormally high. This may have been caused by a large piece 
of equipment turning on unexpectedly, for example an additional 
chiller of an air conditioning system activating on a single 
very hot day. The data value for this abnormally high level 
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is referred to as an "^outlier 1 ' and building operators are 
interested in finding such outliers and learning their cause. 
Outliers often result from equipment of system control 
malfunctions which require correction. 

[0021] The daily usage pattern for each type of utility 
service can be different. For example, the electricity use in 
a manufacturing facility may be relatively uniform every day 
of the week, but a special gas furnace is operated only on 
certain days of the week. The grouping of days for analyzing 
electricity use in this facility will be different than the day 
groups for gas consumption. As a consequence, each utility 
being monitored is configured and analyzed independently. 
[0022] Focusing on one type of utility service, such as 

electricity use for the entire building, acquisition of periodic 
electric power measurements from the main electric meter 17 
produces a set X of n data samples where X e {x x ,x 2 ,x 3 ,...,x n } . The 
analysis will find the elements in set X that are outliers, 
i.e., statistically significantly different than most of the 
data samples. This determination uses a form of the Generalized 
Extreme Studentized Deviate (GESD) statistical procedure 
described by B. Rosner, in "Percentage Points for a Generalized 
ESD Many-Outlier Procedure" Technometrics , Vol. 25, No. 2, pp. 
165-172, May 1983. 

[0023] Prior to the analysis the user needs to specify the 

probability a of incorrectly declaring one or more outliers when 
no outliers exist and an upper bound (n u ) on the number of 
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potential outliers. The probability a defines the sensitivity 
of the process and is redefined periodically based on the number 
of false warnings that are produced by the system finding 
outliers. In other words the probability is adjusted so that 
the number of outliers found results in an acceptable level of 
warnings of abnormal utility consumption within the given 
reporting period, recognizing that false warnings can not be 
eliminated entirely and still have an effective evaluation 
technique. The upper bound (n u ) specifies a maximum number of 
data samples in set X that can be considered to be outliers. 
This number must be less that fifty percent of the total number 
of data samples, since by definition a majority the data samples 
can not be outliers, i.e., n u < O.S(n-l). For example, a upper 
bound (n u ) °f thirty percent can be employed for electricity 
consumption analysis. 

[0024] The data analysis commences at step 40 by setting the 

initial value n out for number of outliers to zero. Then at step 
42 a FOR loop is defined in which the program execution loops 
through steps 44-58 processing each data sample specified by 
the upper bound n Uf i.e. samples x if where i = 1 , 2 , 3 , n u . The 
arithmetic mean (x) of all the elements in set X is calculated 
at the first step 44 of this loop. Then at step 46, the 
standard deviation (s) of the elements in set X is calculated. 
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[0025] If the standard deviation is not greater than zero (s 
> 0), i.e. the samples of utility usage are substantially the 
same as may occur in rare cases, then the pass through the loop 
terminates at step 48 by returning to step 42. Otherwise the 
execution of the algorithm advances to step 50 at which the i tl1 
extreme member in set X is located. That extreme element x ei is 
the element in set X that is farthest from the mean x . Using 
that extreme element x ei the computer 22 calculates the i th 
extreme studentized deviate R ± at step 52 according to the 
expression : 




(1) 



The z' th 100a percent critical value X± then is calculated at step 
54 using the equation: 

kj = . (*-»-'-^ (2) 

^{n-i + \)\n-i-l + t 2 n ^ p ) 

where t n _i_\ p is the student's t-distribution with (n-i-l) degrees 
of freedom, and a percentile p is determined from: 

P =\-{ , a y o) 

\2(n-i + \)) 
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[0026] Abramowitz and Stegun, Handbook of Mathematical 
Functions with Formulas, Graphs, and Mathematical Tables, Dover 
Publications, Inc., New York, 1970, provides an process for 
determining the student's t-distribution t VrP , for the p th 
percentile of a t-distribution with v degrees of freedom. This 
determination begins by estimating the standardized normal 
deviate / at the p th percentile, according to: 



2.515517 + 0.802853f + 0.010328r 
" p " f " I 1 + 1.432788f + 0.189269f 2 + 0.001308f 2 



(5) 



[0027] The student's t-distribution t V/P is estimated from z p 
and the degrees of freedom v using the following expressions: 



(6) 



g 2 =M5z 5 n+ l6zl + 3z p ) (7) 



96 v 



(8) 



g4 =-J—(79zl +776zl +US2z 5 p -1920^ -945 z p ) (9) 
S4 92160 V p p P 



v ' p V V V v 
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[0028] Upon solving equations (1) and (2) , if at step 56 the 
i th extreme studentized deviate R± is greater than the i th 100a 
percent critical value X± (R i >A i ), then the i* extreme data 
sample x e ,± is an outlier and the number of outliers equals i. 
[002 9] At step 58, the extreme element x ei is removed from 
set X and the number of elements in that set now equals n-i . 
The algorithm then returns to step 42 to repeat the process and 
hunt for another outlier. Eventually the set of data samples 
becomes reduced to the upper bound (n u ) at which point the FOR 
loop terminates by branching to step 60. At that point, the 
outliers have been identified with a set of outliers given by 
X t <= {x el ,x e2 ,...,x en }. If no outliers were found in set X, then 
X out is an empty set. 

[0030] After the outliers have been identified a robust 

estimate of the mean (x robust ) and a standard deviation (s robust ) for 
the set of n data samples X e {jc„jc 2 ,jc 3 ,...,*„} are calculated at steps 
64 and 66. In essence this determines how far the outliers 
deviate from the remainder of the data and thus represents the 
severity of the abnormal utility consumption denoted by each 
outlier. The process for making this determination commences 
with the set of outliers X out and the set ( X non _ out ) of the data 
samples from set X that are not outliers. Specifically: 

X non _ out ={*|xeXandx£X 0Ut } (11) 
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[0031] 



The robust estimate of the mean (x robust ) is the average 



value of the elements in set X, 



as given by.- 



c robust 



(12) 



n-n 



where x^eX, 



-non -out • 



[0032] 



The robust estimate of the standard deviation (s robust ) 



is the sample standard deviation of the elements in set X, 
as defined by the expression: 



standard deviation {s robust ) quantify the severity of the abnormal 
utility usage represented by the corresponding outlier. These 
values can be plotted to provide a graphical indication as to 
that severity by which the building operator is able to 
determine whether investigation of the cause is warranted. 
[0034] For days with abnormal energy consumption, the robust 

estimates of the mean (x robust ) and the standard deviation [s robust ) 
are used to determine how different the energy use is from the 
typical day. One measure is a robust estimate of the number of 
standard deviations from the average value: 




(13) 



[0033] 



The robust estimates of the mean (x robust ) and the 



-13- 




where x e j is the energy consumption for the j th outlier, x robust is 
a robust estimate of the average energy consumption for days of 
the same day type as outlier j, and s robust is a robust estimate of 
the standard deviation of energy consumption for days of the 
same day type . 

[0035] The operator can be presented with tables or graphs 

that show the outliers and the amount of variation for the 
outliers . 
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